Add F16 precision toolkit (AVX2) + ARM NEON specialist agent by AdaWorldAPI · Pull Request #91 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-04-13T07:18:56Z

simd_avx2.rs — 3 precision tricks, all AVX2-accelerated (additive only):

Trick 1: Double-f16 (Error-Free Split)
f16_double_encode/decode: store value as hi+lo f16 pair
~20-bit effective precision (vs 10-bit single f16)
f16_double_encode/decode_batch: AVX2 F16C + f32x8 addition
Error: ≤2^{-21} × |value| (vs ≤2^{-11} for single f16)

Trick 2: Kahan-compensated accumulation
f16_kahan_sum: O(ε) error instead of O(N·ε) — independent of count
f16_kahan_dot: AVX2 f32x8 multiply + Kahan-accumulate partial sums

Trick 3: Exponent-aligned scaling (F16Scaler)
from_range/from_data: auto-compute scale factor for value range
encode/decode_batch: AVX2 f32x8 scale + F16C convert
Up to ~128× precision improvement for narrow-range data

⚠️ NOT FOR GGUF CALIBRATION — BF16 pipeline is separate

.claude/agents/arm-neon-specialist.md:
Complete ARM SBC knowledge: Pi Zero 2W through Pi 5, Orange Pi 3-5
Per-CPU microarchitecture (A53/A72/A76 pipeline differences)
big.LITTLE awareness (RK3399, RK3588)
F16 inline asm trick, codebook strategy per tier, memory budgets

6 new tests passing. No existing code modified.

https://claude.ai/code/session_017ZN5PNEf8boFBgorUZVrFU

simd_avx2.rs — 3 precision tricks, all AVX2-accelerated (additive only): Trick 1: Double-f16 (Error-Free Split) f16_double_encode/decode: store value as hi+lo f16 pair ~20-bit effective precision (vs 10-bit single f16) f16_double_encode/decode_batch: AVX2 F16C + f32x8 addition Error: ≤2^{-21} × |value| (vs ≤2^{-11} for single f16) Trick 2: Kahan-compensated accumulation f16_kahan_sum: O(ε) error instead of O(N·ε) — independent of count f16_kahan_dot: AVX2 f32x8 multiply + Kahan-accumulate partial sums Trick 3: Exponent-aligned scaling (F16Scaler) from_range/from_data: auto-compute scale factor for value range encode/decode_batch: AVX2 f32x8 scale + F16C convert Up to ~128× precision improvement for narrow-range data ⚠️ NOT FOR GGUF CALIBRATION — BF16 pipeline is separate .claude/agents/arm-neon-specialist.md: Complete ARM SBC knowledge: Pi Zero 2W through Pi 5, Orange Pi 3-5 Per-CPU microarchitecture (A53/A72/A76 pipeline differences) big.LITTLE awareness (RK3399, RK3588) F16 inline asm trick, codebook strategy per tier, memory budgets 6 new tests passing. No existing code modified. https://claude.ai/code/session_017ZN5PNEf8boFBgorUZVrFU

AdaWorldAPI merged commit b073060 into master Apr 13, 2026
4 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add F16 precision toolkit (AVX2) + ARM NEON specialist agent#91

Add F16 precision toolkit (AVX2) + ARM NEON specialist agent#91
AdaWorldAPI merged 1 commit into
masterfrom
claude/setup-rust-smart-home-SOPAY

AdaWorldAPI commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants